AITopics

Speculative Monte-Carlo Tree Search Scott Cheng Ding-Yong Hong

Neural Information Processing SystemsMar-26-2025, 10:14:21 GMT

Monte-Carlo tree search (MCTS) is an influential sequential decision-making algorithm notably employed in AlphaZero. Despite its success, the primary challenge in AlphaZero training lies in its prolonged time-to-solution due to the high latency imposed by the sequential MCTS process. To address this challenge, this paper proposes and evaluates an inter-decision parallelization strategy called speculative MCTS, a new type of parallelism in AlphaZero which implements speculative execution. This approach allows for the parallel execution of future moves before the current MCTS computations are completed, thus reducing the latency. Additionally, we analyze factors contributing to the overall speedup by studying the synergistic effects of speculation and neural network caching in MCTS. We also provide an analytical model that can be used to evaluate the potential of different speculation strategies before they are implemented and deployed. Our empirical findings indicate that the proposed speculative MCTS can reduce training latency by 5.81 in 9x9 Go games. Moreover, our study shows that speculative execution can enhance the NN cache hit rate by 26% during midgame. Overall, our end-toend evaluation indicates 1.91 speedup in 19x19 Go training time, compared to the state-of-the-art KataGo program.

artificial intelligence, machine learning, simulation, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Italy (0.14)
Europe > Germany (0.14)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Games > Go (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Dual Policy Iteration

Wen Sun, Geoffrey J. Gordon, Byron Boots, J. Bagnell

Neural Information Processing SystemsMar-23-2025, 15:10:07 GMT

A novel class of Approximate Policy Iteration (API) algorithms have recently demonstrated impressive practical performance (e.g., ExIt [1], AlphaGo-Zero [2]). This new family of algorithms maintains, and alternately optimizes, two policies: a fast, reactive policy (e.g., a deep neural network) deployed at test time, and a slow, non-reactive policy (e.g., Tree Search), that can plan multiple steps ahead. The reactive policy is updated under supervision from the non-reactive policy, while the non-reactive policy is improved via guidance from the reactive policy. In this work we study this class of Dual Policy Iteration (DPI) strategy in an alternating optimization framework and provide a convergence analysis that extends existing API theory. We also develop a special instance of this framework which reduces the update of non-reactive policies to model-based optimal control using learned local models, and provides a theoretically sound way of unifying model-free and model-based RL approaches with unknown dynamics. We demonstrate the efficacy of our approach on various continuous control Markov Decision Processes.

artificial intelligence, machine learning, reactive policy, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.69)

Industry: Leisure & Entertainment > Games > Go (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

ReST-MCTS: LLM Self-Training via Process Reward Guided Tree Search Dan Zhang

Neural Information Processing SystemsMar-22-2025, 19:15:34 GMT

Recent methodologies in LLM self-training mostly rely on LLM generating responses and filtering those with correct output answers as training data. This approach often yields a low-quality fine-tuning training set (e.g., incorrect plans or intermediate reasoning).

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe > Estonia (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.45)
Leisure & Entertainment > Games > Go (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

4f5aeaee95e528a0ec5040bfa2fe9303-Paper-Conference.pdf

Neural Information Processing SystemsMar-21-2025, 19:23:42 GMT

artificial intelligence, machine learning, planning & scheduling, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Go (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.95)

Add feedback

48adb34f7ee39177c4c23a8e4253a492-Paper-Conference.pdf

Neural Information Processing SystemsMar-21-2025, 09:07:21 GMT

The success of AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin. However, do these superhuman AZ agents truly learn some general basic knowledge that can be applied to any legal state? In this paper, we first extend the concept of adversarial examples to the game of Go: we generate perturbed states that are "semantically" equivalent to the original state by adding meaningless actions to the game, and an adversarial state is a perturbed state leading to an undoubtedly inferior action that is obvious even for amateur players. However, searching the adversarial state is challenging due to the large, discrete, and non-differentiable search space. To tackle this challenge, we develop the first adversarial attack on Go AIs that can efficiently search for adversarial states by strategically reducing the search space. This method can also be extended to other board games such as NoGo. Experimentally, we show that both Policy-Value neural network (PV-NN) and Monte Carlo tree search (MCTS) can be misled by adding one or two meaningless stones; for example, on 58% of the AlphaGo Zero self-play games, our method can make the widely used KataGo agent with 50 simulations of MCTS plays a losing action by adding two meaningless stones. We additionally evaluated the adversarial examples found by our algorithm with amateur human Go players, and 90% of examples indeed lead the Go agent to play an obviously inferior action. Our code is available at https://PaperCode.cc/GoAttack.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Industry: Leisure & Entertainment > Games > Go (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Policy Mirror Descent with Lookahead

Neural Information Processing SystemsMar-19-2025, 10:15:11 GMT

Policy Mirror Descent (PMD) stands as a versatile algorithmic framework encompassing several seminal policy gradient algorithms such as natural policy gradient, with connections with state-of-the-art reinforcement learning (RL) algorithms such as TRPO and PPO. PMD can be seen as a soft Policy Iteration algorithm implementing regularized 1-step greedy policy improvement. However, 1-step greedy policies might not be the best choice and recent remarkable empirical successes in RL such as AlphaGo and AlphaZero have demonstrated that greedy approaches with respect to multiple steps outperform their 1-step counterpart. In this work, we propose a new class of PMD algorithms called h-PMD which incorporates multi-step greedy policy improvement with lookahead depth h to the PMD update rule.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.48)
Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Efficient Reinforcement Learning by Discovering Neural Pathways

Neural Information Processing SystemsMar-19-2025, 00:06:15 GMT

Reinforcement learning (RL) algorithms have been very successful at tackling complex control problems, such as AlphaGo or fusion control. However, current research mainly emphasizes solution quality, often achieved by using large models trained on large amounts of data, and does not account for the financial, environmental, and societal costs associated with developing and deploying such models. Modern neural networks are often overparameterized and a significant number of parameters can be pruned without meaningful loss in performance, resulting in more efficient use of the model's capacity. We present a methodology for identifying sparse sub-networks within a larger network in reinforcement learning (RL). We call such sub-networks neural pathways. We show empirically that even very small learned sub-networks, using less than 5% of the large network's parameters, can provide very good quality solutions. We also demonstrate the training of multiple pathways within the same networks in a multi-task setup, where each pathway tackles a separate task. We evaluate empirically our approach on several continuous control tasks, in both online and offline settings.

experiment, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec (0.14)
North America > United States > Texas (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Information Technology (0.66)
Education > Educational Setting > Online (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.46)
Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The evolution of AI: From AlphaGo to AI agents, physical AI, and beyond

MIT Technology ReviewFeb-28-2025, 18:41:27 GMT

The release of ChatGPT by OpenAI in November 2022 marked another significant milestone in the evolution of AI. ChatGPT, a large language model capable of generating human-like text, demonstrated the potential of AI to understand and generate natural language. This capability opened up new possibilities for AI applications, from customer service to content creation. The world responded to ChatGPT with a mix of awe and excitement, recognizing the potential of AI to transform how humans communicate and interact with technology to enhance our lives. Today, the rise of agentic AI -- systems capable of advanced reasoning and task execution -- is revolutionizing the way organizations operate.

ai system, large language model, machine learning, (17 more...)

MIT Technology Review

Industry:

Leisure & Entertainment > Games > Go (0.40)
Information Technology > Software (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reinforcement Learning in Strategy-Based and Atari Games: A Review of Google DeepMinds Innovations

Shaheen, Abdelrhman, Badr, Anas, Abohendy, Ali, Alsaadawy, Hatem, Alsayad, Nadine

arXiv.org Artificial IntelligenceFeb-14-2025

Reinforcement Learning (RL) has been widely used in many applications, particularly in gaming, which serves as an excellent training ground for AI models. Google DeepMind has pioneered innovations in this field, employing reinforcement learning algorithms, including model-based, model-free, and deep Q-network approaches, to create advanced AI models such as AlphaGo, AlphaGo Zero, and MuZero. AlphaGo, the initial model, integrates supervised learning and reinforcement learning to master the game of Go, surpassing professional human players. AlphaGo Zero refines this approach by eliminating reliance on human gameplay data, instead utilizing self-play for enhanced learning efficiency. MuZero further extends these advancements by learning the underlying dynamics of game environments without explicit knowledge of the rules, achieving adaptability across various games, including complex Atari games. This paper reviews the significance of reinforcement learning applications in Atari and strategy-based games, analyzing these three models, their key innovations, training processes, challenges encountered, and improvements made. Additionally, we discuss advancements in the field of gaming, including MiniZero and multi-agent models, highlighting future directions and emerging AI models from Google DeepMind.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2502.10303

Country:

North America > United States (0.28)
Africa > Middle East > Egypt (0.15)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment > Games > Go (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Monte Carlo Tree Search (MCTS) algorithms such as AlphaGo and MuZero have achieved superhuman performance in many challenging tasks. However, the computational complexity of MCTS-based algorithms is influenced by the size of the search space. To address this issue, we propose a novel probability tree state abstraction (PTSA) algorithm to improve the search efficiency of MCTS. A general tree state abstraction with path transitivity is defined. In addition, the probability tree state abstraction is proposed for fewer mistakes during the aggregation step. Furthermore, the theoretical guarantees of the transitivity and aggregation error bound are justified. To evaluate the effectiveness of the PTSA algorithm, we integrate it with state-of-the-art MCTS-based algorithms, such as Sampled MuZero and Gumbel MuZero. Experimental results on different tasks demonstrate that our method can accelerate the training process of state-of-the-art algorithms with 10% 45% search space reduction.

machine learning, reinforcement learning, state abstraction, (20 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report (0.68)

Industry: Leisure & Entertainment > Games > Go (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.87)

Add feedback

Filters

Collaborating Authors

Go

Speculative Monte-Carlo Tree Search Scott Cheng Ding-Yong Hong

Dual Policy Iteration

ReST-MCTS: LLM Self-Training via Process Reward Guided Tree Search Dan Zhang

4f5aeaee95e528a0ec5040bfa2fe9303-Paper-Conference.pdf

48adb34f7ee39177c4c23a8e4253a492-Paper-Conference.pdf

Policy Mirror Descent with Lookahead

Efficient Reinforcement Learning by Discovering Neural Pathways

The evolution of AI: From AlphaGo to AI agents, physical AI, and beyond

Reinforcement Learning in Strategy-Based and Atari Games: A Review of Google DeepMinds Innovations

2 Related Work